Skip to content

Conversation

@lemire
Copy link
Member

@lemire lemire commented Nov 10, 2020

If you have binary floating-point numbers, you ought to serialize them using no more than 17 digits. Anymore and you are being wasteful. Our code is optimized for such a scenario. However, if someone has data with lots and lots of digits per floating-point number, then our performance is somewhat less. This PR is better if you expect to be processing long significand.

Configuration: AMD Rome Processor (GCC 10)

model before after (this PR) gain/loss
uniform 1132.06 MB/s 965.50 MB/s -15%
uniform/concise 951.12 MB/s 863.78 MB/s -10%
canada 802.83 MB/s 768.40 MB/s -5%
big_ints 370.35 MB/s 876.81 MB/s + 136%

The slowdown is caused by lower IPC (instruction per cycle) by up to 10% coupled with an increased in the number of instructions (about 5%). It does not cause more branch misses. Depending on the benchmark, between 10 to 20 new instructions per float are needed with this branch in the general case.

See #51 for a different trade-off.

@lemire lemire changed the title This is an experimental branch that might lead to some faster performance when you expect long mantissa This is an experimental branch that might lead to some faster performance when you expect long significand (more than 19 digits) Jan 5, 2021
@lemire lemire closed this Jan 8, 2021
@lemire lemire deleted the lemire/experimental branch July 7, 2021 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants